Search CORE

989 research outputs found

R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections

Author: Huang TonTon Hsien-De
Kao Hung-Yu
Publication venue
Publication date: 15/11/2018
Field of study

The influence of Deep Learning on image identification and natural language processing has attracted enormous attention globally. The convolution neural network that can learn without prior extraction of features fits well in response to the rapid iteration of Android malware. The traditional solution for detecting Android malware requires continuous learning through pre-extracted features to maintain high performance of identifying the malware. In order to reduce the manpower of feature engineering prior to the condition of not to extract pre-selected features, we have developed a coloR-inspired convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2) system. The system can convert the bytecode of classes.dex from Android archive file to rgb color code and store it as a color image with fixed size. The color image is input to the convolutional neural network for automatic feature extraction and training. The data was collected from Jan. 2017 to Aug 2017. During the period of time, we have collected approximately 2 million of benign and malicious Android apps for our experiments with the help from our research partner Leopard Mobile Inc. Our experiment results demonstrate that the proposed system has accurate security analysis on contracts. Furthermore, we keep our research results and experiment materials on http://R2D2.TWMAN.ORG.Comment: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13, 2018. (Accepted

arXiv.org e-Print Archive

Crossref

Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection

Author: Huang TonTon Hsien-De
Kao Hung-Yu
Yu Chia-Mu
Publication venue
Publication date: 15/10/2017
Field of study

The advance of smartphones and cellular networks boosts the need of mobile advertising and targeted marketing. However, it also triggers the unseen security threats. We found that the phone scams with fake calling numbers of very short lifetime are increasingly popular and have been used to trick the users. The harm is worldwide. On the other hand, deceptive advertising (deceptive ads), the fake ads that tricks users to install unnecessary apps via either alluring or daunting texts and pictures, is an emerging threat that seriously harms the reputation of the advertiser. To counter against these two new threats, the conventional blacklist (or whitelist) approach and the machine learning approach with predefined features have been proven useless. Nevertheless, due to the success of deep learning in developing the highly intelligent program, our system can efficiently and effectively detect phone scams and deceptive ads by taking advantage of our unified framework on deep neural network (DNN) and convolutional neural network (CNN). The proposed system has been deployed for operational use and the experimental results proved the effectiveness of our proposed system. Furthermore, we keep our research results and release experiment material on http://DeceptiveAds.TWMAN.ORG and http://PhoneScams.TWMAN.ORG if there is any update.Comment: 6 pages, TAAI 2017 versio

arXiv.org e-Print Archive

Crossref

ELECTRA is a Zero-Shot Learner, Too

Author: Kao Hung-Yu
Ni Shiwen
Publication venue
Publication date: 20/07/2022
Field of study

Recently, for few-shot or even zero-shot learning, the new paradigm "pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm. After the success of prompt-based GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa) prompt learning methods became popular and widely used. However, another efficient pre-trained discriminative model, ELECTRA, has probably been neglected. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a novel our proposed replaced token detection (RTD)-based prompt learning method. Experimental results show that ELECTRA model based on RTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance. Numerically, compared to MLM-RoBERTa-large and MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7% improvement on all 15 tasks. Especially on the SST-2 task, our RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training data. Overall, compared to the pre-trained masked language models, the pre-trained replaced token detection model performs better in zero-shot learning. The source code is available at: https://github.com/nishiwen1214/RTD-ELECTRA.Comment: The source code is available at: https://github.com/nishiwen1214/RTD-ELECTR

arXiv.org e-Print Archive

Robustness Study of Free-Text Speaker Identification and Verification

Author: Kao Yu-Hung
Publication venue
Publication date: 01/01/1993
Field of study

Usable free-text speaker identification and verification systems must exhibit robustness under varying operational conditions. We studied the degree of robustness provided by various signal processing techniques - spectrum subtraction, bandpass liftering, RASTA filtering, ISDCN, and stereo database normalization. The experiments were performed on a widely used, challenging long distance telephone database. This database consists of data recorded at two different sites, with data from one site much poorer in quality than the other; further, the recording equipment had been inadvertently changed for the later half of the sessions resulting in a significantly changed environment. Our study identifies the combination of techniques that provides consistent and significant improvements; our results surpass other published results on the same task. We further verified the results on two other databases and achieved consistent improvements. Detailed results on exhaustive experimentation are presented along with appropriate discussions

Digital Repository at the University of Maryland

Low Complexity CELP Speech Coding at 4.8 kbps

Author: Kao Yu-Hung
Publication venue
Publication date: 01/01/1992
Field of study

Low bit rate, high quality speech coding is a vital part in voice telecommunication systems. The introduction of CELP (1982) (Codebook Excited Linear Prediction) speech coding provides a feasible way to compress speech data to 4.8 kbps with high quality, but the formidable computational complexity required for real-time processing has prevented its wide application. In this thesis, we reduce the computational complexity to 5 MIPS (million instructions per second), which can be handled by even inexpensive DSP chips, while maintaining the same high quality. We hope our contribution can finally make CELP coding a widely applicable technology

Digital Repository at the University of Maryland